Network-aware worker placement for wide-area streaming analytics
نویسندگان
چکیده
Many organizations leverage Distributed Stream processing systems (DPSs) to get insights from the data generated by different users/devices, e.g., Internet of Things (IoT) devices or user clicks on a website, geographically distributed datacenters. The worker nodes in such environments are connected through Wide Area Network (WAN) links with various delays and bandwidth. Therefore, minimizing execution latency task while using enough bandwidth lower cost steer traffic applications is challenging task. In this paper, we formulate node placement for geo-distributed DSPs network as multi-criteria decision-making problem. Then, propose an additive weighting-based approach solve it. users can prioritize according network-relevant parameters. We also framework that be integrated current DPSs execute tasks. test our three widely used stream systems, i.e., Apache Spark, Storm, Flink, custom graphs adopted real cloud providers. run streaming query Yahoo! benchmark these DPSs. experimental results show improves performance Spark up 2.2x–7.2x, Storm 1.2x–3.4x, Flink 1.4x–3.3x compared other approaches, which makes useful use practical environments.
منابع مشابه
Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area
We present JetStream, a system that allows real-time analysis of large, widely-distributed changing data sets. Traditional approaches to distributed analytics require users to specify in advance which data is to be backhauled to a central location for analysis. This is a poor match for domains where available bandwidth is scarce and it is infeasible to collect all potentially useful data. JetSt...
متن کاملVideo Streaming across wide area networks
When we speak of on-line distribution we must first make clear a distinction between on-demand and programmed streaming of contents: in the first case the audience can choose what to play at any time on each screen, while in the second case the broadcaster decides the sequence of videos streamed on all audience screens. The choice among these two distribution schemes is purely functional to the...
متن کاملResource Aware Placement of Data Analytics Platform in Fog Computing
Fog computing is an extension of cloud computing right to the edge of the network, and seeks to minimize service latency and average response time in applications, thereby enhancing the end-user experience. However, there still is the need to define where the service should run for attaining maximum efficiency. By way of the work proposed in this paper, we seek to develop a resource-aware place...
متن کاملQuorum Placement on Wide-Area Networks
Content distribution networks are the dominant technology for distributing shared media on todays Internet. There are several types of content for which these systems have been proven highly successful: static databases, streaming media, online gaming. At the same time this architecture is not appropriate for other types of applications such as transactional databases that require both strong c...
متن کاملWide Area Network Ecology
In an ideal world the need to provide data communications between facilities separated by a large ocean would be filled simply. One would estimate the bandwidth requirement, place an order with a global telecommunications company, then just hook up routers on each end and start using the link. Our experience was considerably more painful, primarily due to three factors: 1) The behavior of some ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Future Generation Computer Systems
سال: 2022
ISSN: ['0167-739X', '1872-7115']
DOI: https://doi.org/10.1016/j.future.2022.06.009